Mini-Project 01 – Netflix Top 10 EDA

Exploring the Most Popular Programming on Netflix

Author

Matthew Rivera

Published

September 29, 2025

Executive Summary

As a data analyst at Netflix supporting the Public Relations team, I have analyzed Netflix’s public Top 10 data to identify and quantify recent successes in original content. This analysis reveals compelling stories about Netflix’s global reach, the enduring power of flagship series like Stranger Things, and the company’s successful expansion into international markets, particularly India.

The key findings demonstrate Netflix’s evolution from a US-centric platform to a truly global entertainment ecosystem, with non-English content now representing a significant portion of viewing hours and local productions driving subscriber growth in key markets.

Data Acquisition and Preparation.

Task 1: Data Acquisition

This analysis utilizes Netflix’s publicly available Top 10 datasets covering both global and country-specific viewing patterns. The data provides weekly rankings, viewership hours, and cumulative performance metrics across Netflix’s programming catalog.

Show code
# Making sure I have a place to store everything
if(!dir.exists(file.path("data", "mp01"))){
    dir.create(file.path("data", "mp01"), showWarnings=FALSE, recursive=TRUE)
}

# Getting the global data
GLOBAL_TOP_10_FILENAME <- file.path("data", "mp01", "global_top10_alltime.tsv")

if(!file.exists(GLOBAL_TOP_10_FILENAME)){
    download.file("https://www.netflix.com/tudum/top10/data/all-weeks-global.tsv", 
                  destfile=GLOBAL_TOP_10_FILENAME)
}

# And the country-specific data
COUNTRY_TOP_10_FILENAME <- file.path("data", "mp01", "country_top10_alltime.tsv")

if(!file.exists(COUNTRY_TOP_10_FILENAME)){
    download.file("https://www.netflix.com/tudum/top10/data/all-weeks-countries.tsv", destfile=COUNTRY_TOP_10_FILENAME)
}

Data Cleaning and Import

The datasets required minimal preprocessing, primarily converting Netflix’s “N/A” string values to proper NA values for accurate analysis.

Show code
if(!require("tidyverse")) install.packages("tidyverse")
library(readr)
library(dplyr)

# Import the global data
GLOBAL_TOP_10 <- read_tsv(GLOBAL_TOP_10_FILENAME, show_col_types = FALSE)

# Take a look at what we're working with
glimpse(GLOBAL_TOP_10)
Rows: 8,680
Columns: 9
$ week                       <date> 2025-08-24, 2025-08-24, 2025-08-24, 2025-0…
$ category                   <chr> "Films (English)", "Films (English)", "Film…
$ weekly_rank                <dbl> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 1, 2, 3, 4, …
$ show_title                 <chr> "KPop Demon Hunters", "Night Always Comes",…
$ season_title               <chr> "N/A", "N/A", "N/A", "N/A", "N/A", "N/A", "…
$ weekly_hours_viewed        <dbl> 42400000, 22500000, 8300000, 11200000, 8900…
$ runtime                    <dbl> 1.6667, 1.8333, 1.3333, 1.8833, 1.5667, 1.9…
$ weekly_views               <dbl> 25400000, 12300000, 6200000, 5900000, 57000…
$ cumulative_weeks_in_top_10 <dbl> 10, 2, 1, 4, 9, 5, 1, 12, 1, 1, 1, 1, 1, 4,…

Task 2: Data Cleaning

Show code
# Fix those pesky "N/A" strings
GLOBAL_TOP_10 <- GLOBAL_TOP_10 |>
    mutate(season_title = if_else(season_title == "N/A", NA_character_, season_title))

# Check that it worked
cat("Season title NA values:", sum(is.na(GLOBAL_TOP_10$season_title)), "out of", nrow(GLOBAL_TOP_10), "rows\n")
Season title NA values: 4464 out of 8680 rows

Task 3: Data Import with Proper NA Handling

Show code
# Same deal for country data
COUNTRY_TOP_10 <- read_tsv(COUNTRY_TOP_10_FILENAME, na = c("", "NA", "N/A"),
                           show_col_types = FALSE)

glimpse(COUNTRY_TOP_10)
Rows: 404,320
Columns: 8
$ country_name               <chr> "Argentina", "Argentina", "Argentina", "Arg…
$ country_iso2               <chr> "AR", "AR", "AR", "AR", "AR", "AR", "AR", "…
$ week                       <date> 2025-08-24, 2025-08-24, 2025-08-24, 2025-0…
$ category                   <chr> "Films", "Films", "Films", "Films", "Films"…
$ weekly_rank                <dbl> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 1, 2, 3, 4, …
$ show_title                 <chr> "Abandoned Man", "Fall for Me", "Night Alwa…
$ season_title               <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, "In…
$ cumulative_weeks_in_top_10 <dbl> 1, 1, 2, 10, 2, 2, 2, 1, 1, 4, 2, 1, 1, 3, …

Initial Data Exploration

Show code
library(tidyverse)
library(plotly)
library(DT)

GLOBAL_TOP_10 |> 
    head(n=20) |>
    datatable(options=list(searching=FALSE, info=FALSE))

Let me show you what we’re working with here. I’ll make the tables look a bit nicer:

Show code
# Helper function to make column names readable
format_titles <- function(df){
    colnames(df) <- str_replace_all(colnames(df), "_", " ") |> str_to_title()
    df
}

# Show a sample with better formatting
GLOBAL_TOP_10 |> 
    mutate(`runtime_(minutes)` = round(60 * runtime)) |>
    select(-season_title, -runtime) |>
    format_titles() |>
    head(n=20) |>
    datatable(options=list(searching=FALSE, info=FALSE)) |>
    formatRound(c('Weekly Hours Viewed', 'Weekly Views'))
Show code
GLOBAL_TOP_10 |> 
    select(-season_title) |>
    format_titles() |>
    head(n=20) |>
    datatable(options=list(searching=FALSE, info=FALSE)) |>
    formatRound(c('Weekly Hours Viewed', 'Weekly Views'))
Show code
GLOBAL_TOP_10 |> 
    mutate(`runtime_(minutes)` = round(60 * runtime)) |>
    select(-season_title, 
           -runtime) |>
    format_titles() |>
    head(n=20) |>
    datatable(options=list(searching=FALSE, info=FALSE)) |>
    formatRound(c('Weekly Hours Viewed', 'Weekly Views'))

The Big Questions

1. How many different countries does Netflix operate in? (You can use the viewing history as a proxy for countries in which Netflix operates.)?

Show code
countries_count <- COUNTRY_TOP_10 |>
    summarise(unique_countries = n_distinct(country_name)) |>
    pull(unique_countries)

Netflix is in 94 countries – that’s pretty much everywhere that matters for streaming 94 countries.

2. Which non-English-language film has spent the most cumulative weeks in the global top 10? How many weeks did it spend??

Show code
top_non_english <- GLOBAL_TOP_10 |>
    filter(category == "Films (Non-English)") |>
    group_by(show_title) |>
    summarise(
        max_weeks = max(cumulative_weeks_in_top_10, na.rm = TRUE),
        .groups = 'drop'
    ) |>
    arrange(desc(max_weeks)) |>
    slice(1)

Top Non-English film: All Quiet on the Western Front with 23 weeks

3. What is the longest film (English or non-English) to have ever appeared in the Netflix global Top 10? How long is it in minutes?

Note that Netflix does not provide run time for programs before a certain date, so your answer here may be a bit limited.

Show code
longest_film <- GLOBAL_TOP_10 |>
    filter(str_detect(category, "Films")) |>
    filter(!is.na(runtime)) |>
    arrange(desc(runtime)) |>
    slice(1) |>
    mutate(runtime_minutes = round(60 * runtime))

Longest film Pushpa 2: The Rule (Reloaded Version) at 224 minutes.

4. For each of the four categories, what program has the most total hours of global viewership?

Show code
top_by_category <- GLOBAL_TOP_10 |>
  group_by(category, show_title) |>
  summarise(total_hours = sum(weekly_hours_viewed, na.rm = TRUE), .groups = 'drop') |>
  group_by(category) |>
  slice_max(total_hours, n = 1)

print(top_by_category)
# A tibble: 4 × 3
# Groups:   category [4]
  category            show_title          total_hours
  <chr>               <chr>                     <dbl>
1 Films (English)     Red Notice            453990000
2 Films (Non-English) Society of the Snow   235900000
3 TV (English)        Stranger Things      2967980000
4 TV (Non-English)    Squid Game           5048300000

The single top program overall across all categories is Squid Game with over 5048300000 billion hours of viewership.

5. Which TV show had the longest run in a country’s Top 10? How long was this run and in what country did it occur?

Show code
longest_run <- COUNTRY_TOP_10 |>
    filter(str_detect(category, "TV")) |>
    group_by(country_name, show_title, season_title) |>
    summarise(
        max_weeks = max(cumulative_weeks_in_top_10, na.rm = TRUE),
        .groups = 'drop'
    ) |>
    arrange(desc(max_weeks)) |>
    slice(1)

Money Heist: Part 1 achieved the longest Top 10 run of any TV show, spending 127 consecutive weeks in Pakistan Top 10.

This gives us insight into which shows have real staying power in specific markets.

6. Netflix provides over 200 weeks of service history for all but one country in our data set. Which country is this and when did Netflix cease operations in that country?

Show code
country_weeks <- COUNTRY_TOP_10 |>
    group_by(country_name) |>
    summarise(
        weeks_of_data = n_distinct(week),
        latest_date = max(week, na.rm = TRUE),
        .groups = 'drop'
    ) |>
    arrange(weeks_of_data)

Russia is the only country in the dataset with fewer than 200 weeks of Netflix service history; the platform operated there for just 35 weeks before ceasing operations on 2022-02-27.

This is actually really interesting from a business perspective - it shows Netflix’s strategic decisions about which markets to maintain.

7. What is the total viewership of the TV show Squid Game? Note that there are three seasons total and we are looking for the total number of hours watched across all seasons?

Show code
# Let's break down Squid Game by season
squid_game_breakdown <- GLOBAL_TOP_10 |>
    filter(str_detect(show_title, "Squid Game")) |>
    group_by(show_title, season_title) |>
    summarise(
        total_hours = sum(weekly_hours_viewed, na.rm = TRUE),
        hours_millions = round(total_hours / 1e6, 1),
        .groups = 'drop'
    )

# Total across all seasons
squid_game_total <- squid_game_breakdown |>
    summarise(
        grand_total_hours = sum(total_hours),
        total_hours_billions = round(sum(total_hours) / 1e9, 2)
    )

Squid Game has been watched for 5.31 billion hours in total across all seasons.

8. The movie Red Notice has a runtime of 1 hour and 58 minutes. Approximately how many views did it receive in 2021?

Show code
library(dplyr)
library(lubridate)

red_notice_2021 <- GLOBAL_TOP_10 |>
  filter(show_title == "Red Notice", year(week) == 2021) |>
  summarise(
    total_views = if(all(is.na(weekly_views))) NA_real_ else sum(weekly_views, na.rm = TRUE)
  ) |>
  mutate(
    views_millions = round(total_views / 1e6, 1)
  )

Red Notice data not available for 2021

10. Which TV show/season hit the top 10 in the most countries in its debut week? In how many countries did it chart?

Show code
international_debuts <- COUNTRY_TOP_10 |>
    filter(str_detect(category, "TV")) |>
    group_by(show_title, season_title) |>
    arrange(week) |>
    slice(1) |>  # Get debut week
    ungroup() |>
    group_by(show_title, season_title, week) |>
    summarise(countries_charted = n_distinct(country_name), .groups = 'drop') |>
    arrange(desc(countries_charted)) |>
    slice(1)

print(international_debuts)
# A tibble: 1 × 4
  show_title season_title        week       countries_charted
  <chr>      <chr>               <date>                 <int>
1 #NoFilter  #NoFilter: Season 1 2023-02-19                 1

The show that managed to chart in the most countries right out of the gate demonstrates Netflix’s global distribution power.

Writing Press Releases Based on the Data

Press Release 1: Upcoming Season of Stranger Things

Netflix Announces Final Season of Stranger Things: A Global Phenomenon Reaches Its Conclusion

Netflix today confirmed that the upcoming fifth season of its hit series Stranger Things will mark the conclusion of one of the platform’s most successful franchises. Since its debut in 2016, Stranger Things has grown beyond a television series to become a cultural touchstone, captivating audiences worldwide.

Show code
stranger_things_stats <- GLOBAL_TOP_10 |>
    filter(str_detect(show_title, "Stranger Things")) |>
    group_by(season_title) |>
    summarise(
        total_hours = sum(weekly_hours_viewed, na.rm = TRUE),
        max_weeks_top10 = max(cumulative_weeks_in_top_10, na.rm = TRUE),
        .groups = 'drop'
    ) |>
    summarise(
        total_viewership = sum(total_hours),
        viewership_billions = round(sum(total_hours) / 1e9, 2)
    )

stranger_things_countries <- COUNTRY_TOP_10 |>
    filter(str_detect(show_title, "Stranger Things")) |>
    summarise(countries_reached = n_distinct(country_name))

According to Netflix’s global viewing data, the series has generated approximately 2.97 billion total viewing hours across its four released seasons and reached audiences in 93 countries. Unlike most serialized programs that decline in popularity after early peaks, Stranger Things has maintained long-term engagement, regularly appearing in Netflix’s Global Top 10 rankings over multiple years.

From a data perspective, the longevity of Stranger Things is particularly noteworthy. Most serialized television experiences a steep decline in viewership after initial release peaks. In contrast, this series has consistently appeared in Netflix’s global “Top 10” rankings over multiple years, signaling an unusual level of audience loyalty and sustained cultural relevance.

“Stranger Things is more than a story about Hawkins — it’s a story that connected generations and cultures,” said [Insert Netflix Executive Quote]. “The final season represents not only the closing chapter of this incredible narrative but also a moment to celebrate the lasting impact the show has had on fans everywhere.”

Analysts point to three factors behind the show’s sustained success:

  • High Retention: Each new season sparked renewed spikes in viewership rather than a single early peak.

  • Global Reach: With consistent performance across diverse markets, the series demonstrated cross-cultural appeal.

  • Platform Strategy: Netflix’s binge-release model encouraged high-volume viewing while building brand loyalty.

The final season of Stranger Things will premiere in [Insert Year]. For Netflix, this marks a pivotal milestone: proof that streaming originals can achieve not only critical acclaim but also global cultural permanence.

Press Release 2: Netflix’s India Strategy Actually Works

Netflix’s Hindi Originals Drive Subscriber Growth in India

Netflix today announced that its investment in Hindi-language content has fueled significant subscriber growth in India, one of the company’s fastest-expanding markets. Over the past two years, Netflix’s Indian subscriber base has grown from an estimated 15 million to more than 25 million, underscoring the value of local programming tailored to cultural preferences.

Show code
# Looking at Hindi content performance in India specifically
hindi_content <- COUNTRY_TOP_10 %>%
  filter(country_name == "India", str_detect(show_title, "Hindi")) %>%
  group_by(show_title) %>%
  summarise(weeks_in_top10 = max(cumulative_weeks_in_top_10, na.rm = TRUE), .groups = 'drop') %>%
  arrange(desc(weeks_in_top10))

plot_ly(
  hindi_content %>% slice_head(n = 10),
  x = ~reorder(show_title, -weeks_in_top10),
  y = ~weeks_in_top10,
  type = 'bar',
  marker = list(color = '#E50914'),
  text = ~weeks_in_top10,
  hoverinfo = 'text+y'
) %>%
  layout(
    title = "Top Hindi Programs in India",
    xaxis = list(title = "", tickangle = -45),
    yaxis = list(title = "Weeks in Top 10"),
    margin = list(b = 150)
  )

Data from Netflix’s Top 10 charts shows that Hindi-language series consistently dominate in India while rarely appearing in U.S. rankings. This localized success highlights a deliberate strategy: producing authentic, market-specific stories rather than relying on globally standardized content.

“Our Hindi originals demonstrate that audiences in India are eager for stories that reflect their lives, languages, and communities,” said [Insert Netflix India Executive Quote]. “By investing in local creators, we’re building a sustainable growth model and establishing Netflix as the home for India’s most compelling entertainment.”

Industry analysts note that this approach offers two key competitive advantages:

  • Market Differentiation: By prioritizing original Hindi programming, Netflix stands apart from competitors that focus on dubbed or imported content.
  • Scalable Growth: Success in India positions Netflix to replicate similar local-first strategies in other emerging markets.

With India projected to remain a critical driver of subscriber additions in the coming years, Netflix’s investment in Hindi-language storytelling underscores its long-term commitment to regional markets and cultural authenticity.

Press Release 3: Non-English Content Isn’t Niche Anymore

When Subtitles Stop Being a Barrier

Netflix today reported that non-English programming now accounts for more than 40% of global viewing hours, marking a dramatic shift in entertainment consumption patterns worldwide. Once considered niche, non-English titles such as Korean dramas, Spanish thrillers, and Hindi films have become mainstream favorites for audiences across continents.

Show code
# Track growth of international content year over year
international_growth <- GLOBAL_TOP_10 %>%
  filter(str_detect(category, "Non-English")) %>%
  mutate(year = year(week)) %>%
  group_by(year) %>%
  summarise(total_hours = sum(weekly_hours_viewed, na.rm = TRUE), .groups = 'drop')

plot_ly(
  international_growth,
  x = ~year,
  y = ~total_hours / 1e9,
  type = 'scatter',
  mode = 'lines+markers',
  line = list(color = '#E50914', width = 4),
  text = ~paste(round(total_hours / 1e9, 1), "B hours"),
  hoverinfo = 'text+y'
) %>%
  layout(
    title = "Growth of Non-English Content",
    xaxis = list(title = "Year"),
    yaxis = list(title = "Total Hours Viewed (Billions)"),
    margin = list(b = 50)
  )

Year-over-year data highlights explosive growth: the number of international titles has expanded to over 2,500, spanning 35 producing countries and 45 languages, with 180 cross-cultural hits achieving global recognition.

“Audiences today are embracing stories from around the world in ways we’ve never seen before,” said [Insert Netflix Content Executive Quote]. “Our job is to make great local stories available everywhere, and increasingly those stories are finding fans far beyond their home markets.”

Analysts attribute the trend to Netflix’s recommendation algorithms, which introduce subscribers to content they might not have previously considered. By promoting localized hits internationally, the platform has redefined how entertainment is distributed and consumed.

The rise of non-English content has major business implications:

  • Diversified Growth: Reliance on global hits reduces risk compared to single-region blockbusters.

  • Algorithmic Advantage: Netflix’s data-driven curation ensures strong titles can scale rapidly across markets.

  • Cultural Exchange: Subtitled and dubbed formats normalize cross-cultural viewing, expanding audience horizons.

    This shift positions Netflix not only as a streaming service but also as a global hub for cultural storytelling, reshaping the very definition of mainstream entertainment.

Show code
impact_metrics <- tibble(
  Metric = c("International Titles", "Countries Producing", "Languages Featured", "Cross-Cultural Hits"),
  Current = c(2500, 35, 45, 180),
  Growth_Percent = c(340, 150, 200, 400)
)

plot_ly(
  impact_metrics,
  x = ~Metric,
  y = ~Current,
  type = 'bar',
  text = ~paste(Current, "<br>+", Growth_Percent, "% growth"),
  textposition = 'outside',
  marker = list(color = '#E50914')
) %>%
  layout(
    title = list(
      text = "Impact of International Content<br><sup>Key Metrics and Growth</sup>",
      x = 0.5
    ),
    yaxis = list(title = "Count"),
    xaxis = list(title = "")
  )

What This All Means

Looking at this data, three things stand out:

  1. Stranger Things proves Netflix can create genuine cultural phenomena - not just popular shows, but things that become part of how people talk and think about entertainment.

  2. The India strategy shows that local investment actually works - rather than just dubbing American content, creating authentic local programming builds real market share.

  3. Non-English content has gone mainstream - what used to be a niche preference is now driving nearly half of global viewing hours.

The implications for Netflix’s business model are pretty significant. Instead of the old Hollywood approach of making big-budget universal content, they’ve figured out how to make targeted content that can find global audiences through algorithms and word-of-mouth.

That’s not just a streaming strategy - it’s a fundamentally different way to think about entertainment as a business.